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We analysed primary breast cancers by genomic DNA copy number arrays, DNA methylation, exome sequencing, 
messenger RNA arrays, microRNA sequencing and reverse-phase protein arrays. Our ability to integrate information 
across platforms provided key insights into previously defined gene expression subtypes and demonstrated the existence 
of four main breast cancer classes when combining data from five platforms, each of which shows significant molecular 
heterogeneity. Somatic mutations in only three genes (TP53, PIK3CA and GAT A3) occurred at >10% incidence across all 
breast cancers; however, there were numerous subtype -associated and novel gene mutations including the enrichment 
of specific mutations in GAT A3, PIK3CA and MAP3K1 with the luminal A subtype. We identified two novel 
protein- expression- defined subgroups, possibly produced by stromal/ microenvironmental elements, and integrated 
analyses identified specific signalling pathways dominant in each molecular subtype including a HER2/phosphorylated 
HER2/EGFR/phosphorylated EGFR signature within the HER2-enriched expression subtype. Comparison of basal-like 
breast tumours with high-grade serous ovarian tumours showed many molecular commonalities, indicating a related 
aetiology and similar therapeutic opportunities. The biological finding of the four main breast cancer subtypes caused by 
different subsets of genetic and epigenetic abnormalities raises the hypothesis that much of the clinically observable 
plasticity and heterogeneity occurs within, and not across, these major biological subtypes of breast cancer. 



Breast cancer is one of the most common cancers with greater than 
1,300,000 cases and 450,000 deaths each year worldwide. Clinically, 
this heterogeneous disease is categorized into three basic therapeutic 
groups. The oestrogen receptor (ER) positive group is the most 
numerous and diverse, with several genomic tests to assist in predict- 
ing outcomes for ER + patients receiving endocrine therapy 1 ' 2 . The 
HER2 (also called ERBB2) amplified group 3 is a great clinical success 
because of effective therapeutic targeting of HER2, which has led to 
intense efforts to characterize other DNA copy number aberrations 4 ' 5 . 
Triple-negative breast cancers (TNBCs, lacking expression of ER, 
progesterone receptor (PR) and HER2), also known as basal-like 
breast cancers 6 , are a group with only chemotherapy options, and 
have an increased incidence in patients with germline BRCA1 muta- 
tions 7 ' 8 or of African ancestry 9 . 

Most molecular studies of breast cancer have focused on just one or 
two high information content platforms, most frequently mRNA 
expression profiling or DNA copy number analysis, and more 
recently massively parallel sequencing 10 " 12 . Supervised clustering of 
mRNA expression data has reproducibly established that breast 
cancers encompass several distinct disease entities, often referred to 
as the intrinsic subtypes of breast cancer 13,14 . The recent development of 
additional high information content assays focused on abnormalities 
in DNA methylation, microRNA (miRNA) expression and protein 
expression, provide further opportunities to characterize more com- 
pletely the molecular architecture of breast cancer. In this study, a 
diverse set of breast tumours were assayed using six different technology 
platforms. Individual platform and integrated pathway analyses iden- 
tified many subtype- specific mutations and copy number changes that 
identify therapeutically tractable genomic aberrations and other events 
driving tumour biology. 

Samples and clinical data 

Tumour and germline DNA samples were obtained from 825 
patients. Different subsets of patients were assayed on each platform: 



466 tumours from 463 patients had data available on five platforms 
including Agilent mRNA expression microarrays (n = 547), Illumina 
Infinium DNA methylation chips (n = 802), Affymetrix 6.0 single 
nucleotide polymorphism (SNP) arrays (n = 773), miRNA sequencing 
(n = 697), and whole-exome sequencing (n = 507); in addition, 348 of 
the 466 samples also had reverse-phase protein array (RPPA) data 
(n = 403). Owing to the short median overall follow up (17 months) 
and the small number of overall survival events (93 out of 8 18), survival 
analyses will be presented in a later publication. Demographic and 
clinical characteristics are presented in Supplementary Table 1. 

Significantly mutated genes in breast cancer 

Overall, 510 tumours from 507 patients were subjected to whole- 
exome sequencing, identifying 30,626 somatic mutations comprised 
of 28,319 point mutations, 4 dinucleotide mutations, and 2,302 
insertions/deletions (indels) (ranging from 1 to 53 nucleotides). The 
point mutations included 6,486 silent, 19,045 missense, 1,437 
nonsense, 26 read- through, 506 splice-site mutations, and 819 muta- 
tions in RNA genes. Comparison to COSMIC and OMIM databases 
identified 619 mutations across 177 previously reported cancer genes. 
Of 19,045 missense mutations, 9,484 were predicted to have a high 
probability of being deleterious by Condel 15 . The MuSiC package 16 , 
which determines the significance of the observed mutation rate of 
each gene based on the background mutation rate, identified 35 sig- 
nificantly mutated genes (excluding LOC or Ensembl gene IDs) by at 
least two tests (convolution and likelihood ratio tests) with false dis- 
covery rate (FDR) <5% (Supplementary Table 2). 

In addition to identifying nearly all genes previously implicated in 
breast cancer (PIK3CA, PTEN, AKT1, TP53, GAT A3, CDH1, RBI, 
MLL3, MAP3K1 and CDKN1B), a number of novel significantly 
mutated genes were identified including TBX3, RUNX1, CBFB, AFF2, 
PIK3R1, PTPN22, PTPRD, NF1, SF3B1 and CCND3. TBX3, which is 
mutated in ulnar-mammary syndrome and involved in mammary 
gland development 17 , harboured 13 mutations (8 frame-shift indels, 
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1 in-frame deletion, 1 nonsense, and 3 missense), suggesting a loss of 
function. Additionally, 2 mutations were found in TBX4 and 1 muta- 
tion in TBX5, which are genes involved in Holt-Oram syndrome 18 . 
Two other transcription factors, CTCF and FOXA1, were at or near 
significance harbouring 13 and 8 mutations, respectively. RUNX1 and 
CBFB, both rearranged in acute myeloid leukaemia and interfering 
with haematopoietic differentiation, harboured 19 and 9 mutations, 
respectively. PIK3R1 contained 14 mutations, most of which clustered 
in the PIK3CA interaction domain similar to previously identified 
mutations in glioma 19 and endometrial cancer 20 . We also observed a 
statistically significant exclusion pattern among PIK3R1, PIK3CA, 
PTEN and AKT1 mutations (P = 0.025). Mutation of splicing factor 
SF3B1, previously described in myelodysplastic syndromes 21 and 
chronic lymphocytic leukaemia 22 , was significant with 15 non-silent 
mutations, of which 4 were a recurrent K700E substitution. Two 
protein tyrosine phosphatases (PTPN22 and PTPRD) were also signifi- 
cantly mutated; frequent deletion/mutation of PTPRD is observed in 
lung adenocarcinoma 23 . 

Mutations and mRNA-expression subtype associations 

We analysed the somatic mutation spectrum within the context of the 
four mRNA-expression subtypes, excluding the normal-like group 
owing to small numbers (n = 8) (Fig. 1). Several significantly mutated 
genes showed mRNA-subtype-specific (Supplementary Figs 1-3) and 
clinical- subtype-specific patterns of mutation (Supplementary Table 2). 
Significantly mutated genes were considerably more diverse and 
recurrent within luminal A and luminal B tumours than within 
basal-like and HER2-enriched (HER2E) subtypes; however, the overall 
mutation rate was lowest in luminal A subtype and highest in the basal- 
like and HER2E subtypes. The luminal A subtype harboured the most 
significantly mutated genes, with the most frequent being PIK3CA 
(45%), followed by MAP3K1, GAT A3, TP53, CDH1 and MAP2K4. 
Twelve per cent of luminal A tumours contained likely inactivating 
mutations in MAP3K1 and MAP2K4, which represent two contiguous 



steps in the p38-JNKl stress kinase pathway 24 . Luminal B cancers 
exhibited a diversity of significantly mutated genes, with TP53 and 
PIK3CA (29% each) being the most frequent. The luminal tumour 
subtypes markedly contrasted with basal-like cancers where TP53 
mutations occurred in 80% of cases and the majority of the luminal 
significantly mutated gene repertoire, except PIK3CA (9%), were 
absent or near absent. The HER2E subtype, which has frequent 
HER2 amplification (80%), had a hybrid pattern with a high frequency 
of TP53 (72%) and PIK3CA (39%) mutations and a much lower fre- 
quency of other significantly mutated genes including PIK3R1 (4%). 

Intrinsic mRNA subtypes differed not only by mutation frequencies 
but also by mutation type. Most notably, TP53 mutations in basal-like 
tumours were mostly nonsense and frame shift, whereas missense 
mutations predominated in luminal A and B tumours (Supplemen- 
tary Fig. 1). Fifty-eight somatic GAT A3 mutations, some of which were 
previously described 25 , were detected including a hotspot 2 -base-pair 
deletion within intron 4 only in the luminal A subtype (13 out of 13 
mutants) (Supplementary Fig. 2). In contrast, 7 out of 9 frame-shift 
mutations in exon 5 (DNA binding domain) occurred in luminal B 
cancers. PIK3CA mutation frequency and spectrum also varied by 
mRNA subtype (Supplementary Fig. 3); the recurrent PIK3CA 
E545K mutation was present almost exclusively within luminal A 
(25 out of 27) tumours. CDH1 mutations were common (30 out of 
36) within the lobular histological subtype and corresponded with 
lower CDH1 mRNA (Supplementary Fig. 4) and protein expression. 
Finally, we identified 4 out of 8 somatic variants in HER2 within lobular 
cancers, three of which were within the tyrosine kinase domain. 

We performed analyses on a selected set of genes 26 using the normal 
tissue DNA data and detected a number of germline predisposing 
variants. These analyses identified 47 out of 507 patients with 
deleterious germline variants, representing nine different genes 
(ATM, BRCA1, BRCA2, BRIP1, CHEK2, NBN, PTEN, RAD51C and 
TP53; Supplementary Table 3), supporting the hypothesis that —10% 
of sporadic breast cancers may have a strong germline contribution. 
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Percentages of cases with mutation by expression subtype 

Figure 1 | Significantly mutated genes and correlations with genomic and 
clinical features. Tumour samples are grouped by mRNA subtype: luminal A 
(n = 225), luminal B (n = 126), HER2E (n = 57) and basal-like (n = 93). The 
left panel shows non- silent somatic mutation patterns and frequencies for 
significantly mutated genes. The middle panel shows clinical features: dark 
grey, positive or T2-4; white, negative or Tl; light grey, N/A or equivocal. 




| Amplification ■ Deletion 



N, node status; T, tumour size. The right panel shows significantly mutated 
genes with frequent copy number amplifications (red) or deletions (blue). The 
far- right panel shows non- silent mutation rate per tumour (mutations per 
megabase, adjusted for coverage). The average mutation rate for each 
expression subtype is indicated. Hypermutated: mutation rates >3 s.d. above 
the mean (>4.688, indicated by grey line). 
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These data confirmed the association between the presence of 
germline BRCA1 mutations and basal-like breast cancers 7 ' 8 . 

Gene expression analyses (mRNA and miRNA) 

Several approaches were used to look for structure in the mRNA 
expression data. We performed an unsupervised hierarchical cluster- 
ing analysis of 525 tumours and 22 tumour- adjacent normal tissues 
using the top 3,662 variably expressed genes (Supplementary Fig. 5); 
SigClust analysis identified 12 classes (5 classes with >9 samples per 
class). We performed a semi- supervised hierarchical cluster analysis 
using a previously published 'intrinsic gene list' 14 , which identified 13 
classes (9 classes with >9 samples per class) (Supplementary Fig. 6). 
We also classified each sample using the 50-gene PAM50 model 14 
(Supplementary Fig. 5). High concordance was observed between 
all three analyses; therefore, we used the PAM50 -defined subtype 
predictor as a common classification metric. There were only eight 
normal-like and eight claudin-low tumours 27 , thus we did not per- 
form focussed analyses on these two subtypes. 

Micro RNA expression levels were assayed via Illumina sequencing, 
using 1,222 miRBase 28 vl6 mature and star strands as the reference 
database of miRNA transcripts/genes. Seven subtypes were identified 
by consensus non-negative matrix factorization (NMF) clustering 
using an abundance matrix containing the 25% most variable 
miRNAs (306 transcripts/genes or MIMATs (miRNA IDs)). These 
subtypes correlated with mRNA subtypes, ER, PR and HER2 clinical 
status (Supplementary Fig. 7). Of note, miRNA groups 4 and 5 
showed high overlap with the basal-like mRNA subtype and con- 
tained many TP53 mutations. The remaining miRNA groups (1-3, 
6 and 7) were composed of a mixture of luminal A, luminal B and 
HER2E with little correlation with the PAM50 defined subtypes. With 
the exception of TP53 — which showed a strong positive correlation — 
and PIK3CA and GAT A3 — which showed negative associations with 
groups 4 and 5, respectively— there was little correlation with muta- 
tion status and miRNA subtype. 

DNA methylation 

Illumina Infinium DNA methylation arrays were used to assay 802 
breast tumours. Data from HumanMethylation27 (HM27) and 
HumanMethylation450 (HM450) arrays were combined and filtered 
to yield a common set of 574 probes used in an unsupervised clustering 
analysis, which identified five distinct DNA methylation groups 
(Supplementary Fig. 8). Group 3 showed a hypermethylated pheno- 
type and was significantly enriched for luminal B mRNA subtype and 
under-represented for P1K3CA, MAP3K1 and MAP2K4 mutations. 
Group 5 showed the lowest levels of DNA methylation, overlapped 
with the basal-like mRNA subtype, and showed a high frequency of 
TP53 mutations. HER2-positive (HER2 + ) clinical status, or the 
HER2E mRNA subtype, had only a modest association with the 
methylation subtypes. 

A supervised analysis of the DNA methylation and mRNA expres- 
sion data was performed to compare DNA methylation group 3 
(N= 49) versus all tumours in groups 1, 2 and 4 (excluding group 5, 
which consisted predominantly of basal-like tumours). This analysis 
identified 4,283 genes differentially methylated (3,735 higher in group 
3 tumours) and 1,899 genes differentially expressed (1,232 downregu- 
lated); 490 genes were both methylated and showed lower expression 
in group 3 tumours (Supplementary Table 4). A DAVID (database for 
annotation, visualization and integrated discovery) functional annota- 
tion analysis identified 'extracellular region part' and 'Wnt signalling 
pathway' to be associated with this 490 -gene set; the group 3 hyper- 
methylated samples showed fewer PIK3CA and MAP3K1 mutations, 
and lower expression of Wnt-pathway genes. 

DNA copy number 

A total of 773 breast tumours were assayed using Affymetrix 6.0 
SNP arrays. Segmentation analysis and GISTIC were used to 



identify focal amplifications/deletions and arm-level gains and losses 
(Supplementary Table 5). These analyses confirmed all previously 
reported copy number variations and highlighted a number of sig- 
nificantly mutated genes including focal amplification of regions con- 
taining PIK3CA, EGFR, FOXA1 and HER2, as well as focal deletions of 
regions containing MLL3, PTEN, RBI and MAP2K4 (Supplementary 
Fig. 9); in all cases, multiple genes were included within each altered 
region. Importantly, many of these copy number changes correlated 
with mRNA subtype including characteristic loss of 5q and gain of 
lOp in basal-like cancers 5,29 and gain of lq and/or 16q loss in luminal 
tumours 4 . NMF clustering of GISTIC segments identified five copy 
number clusters/groups that correlated with mRNA subtypes, ER, PR 
and HER2 clinical status, and TP53 mutation status (Supplementary 
Fig. 10). In addition, this aCGH subtype classification was highly 
correlated with the aCGH subtypes recently defined by ref. 30 
(Supplementary Fig. 11). 

Reverse phase protein arrays 

Quantified expression of 171 cancer-related proteins and phospho- 
proteins by RPPA was performed on 403 breast tumours 31 . 
Unsupervised hierarchical clustering analyses identified seven 
subtypes; one class contained too few cases for further analysis 
(Supplementary Fig. 12). These protein subtypes were highly 
concordant with the mRNA subtypes, particularly with basal-like 
and HER2E mRNA subtypes. Closer examination of the HER2- 
containing RPPA-defmed subgroup showed coordinated overexpres- 
sion of HER2 and EGFR with a strong concordance with phosphorylated 
HER2 (pY1248) and EGFR (pY992), probably from heterodimeriza- 
tion and cross-phosphorylation. Although there is a potential for 
modest cross reactivity of antibodies against these related total and 
phospho-proteins, the concordance of phosphorylation of HER2 and 
EGFR was confirmed using multiple independent antibodies. 

In RPPA-defined luminal tumours, there was high protein expres- 
sion of ER, PR, AR, BCL2, GATA3 and INPP4B, defining mostly 
luminal A cancers and a second more heterogeneous protein subgroup 
composed of both luminal A and luminal B cancers. Two potentially 
novel protein- defined subgroups were identified: reactive I consisted 
primarily of a subset of luminal A tumours, whereas reactive II 
consisted of a mixture of mRNA subtypes. These groups are termed 
'reactive' because many of the characteristic proteins are probably 
produced by the microenvironment and/or cancer- activated fibroblasts 
including fibronectin, caveolin 1 and collagen VI. These two RPPA 
groups did not have a marked difference in the percentage tumour cell 
content when compared to each other, or the other protein subtypes, 
as assessed by SNP array analysis or pathological examination. In 
addition, supervised analyses of reactive I versus II groups using 
miRNA expression, DNA methylation, mutation, or DNA copy 
number data identified no significant differences between these groups, 
whereas similar supervised analyses using protein and mRNA expres- 
sion identified many differences. 

Multiplatform subtype discovery 

To reveal higher- order structure in breast tumours based on multiple 
data types, significant clusters/subtypes from each of five platforms 
were analysed using a multiplatform data matrix subjected to 
unsupervised consensus clustering (Fig. 2). This 'cluster of clusters' 
(C-of-C) approach illustrated that basal-like cancers had the most 
distinct multiplatform signature as all the different platforms for 
the basal-like groups clustered together. To a great extent, the four 
major C-of-C subdivisions correlated well with the previously published 
mRNA subtypes (driven, in part, by the fact that the four intrinsic 
subtypes were one of the inputs). Therefore, we also performed 
C-of-C analysis with no mRNA data present (Supplementary Fig. 13) 
or with the 12 unsupervised mRNA subtypes (Supplementary Fig. 14), 
and in each case 4-6 groups were identified. Recent work identified ten 
copy- number-based subgroups in a 997 breast cancer set 30 . We evaluated 
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Figure 2 | Coordinated analysis of breast cancer subtypes defined from five 
different genomic/proteomic platforms, a, Consensus clustering analysis of 
the subtypes identifies four major groups (samples, n = 348). The blue and 
white heat map displays sample consensus, b, Heat-map display of the subtypes 
defined independently by miRNAs, DNA methylation, copy number (CN), 
PAM50 mRNA expression, and RPPA expression. The red bar indicates 
membership of a cluster type, c, Associations with molecular and clinical 
features. P values were calculated using a chi- squared test. 

this classification in a C-of-C analysis instead of our five-class copy 
number subtypes, with either the PAM50 (Supplementary Fig. 15) or 
12 unsupervised mRNA subtypes (Supplementary Fig. 16); each of 
these C-of-C classifications was highly correlated with PAM50 
mRNA subtypes and with the other C-of-C analyses (Fig. 2). The 
transcriptional profiling and RPPA platforms demonstrated a high 
correlation with the consensus structure, indicating that the informa- 
tion content from copy number aberrations, miRNAs and methylation 
is captured at the level of gene expression and protein function. 

Luminal /ER + summary analysis 

Luminal/ER + breast cancers are the most heterogeneous in terms of 
gene expression (Supplementary Fig. 5), mutation spectrum (Fig. 1), 
copy number changes (Supplementary Fig. 9) and patient outcomes 1 ' 14 . 
One of the most dominant features is high mRNA and protein expres- 
sion of the luminal expression signature (Supplementary Fig. 5), which 
contains ESR1, GAT A3, FOXA1, XBP1 and MYB; the luminal/ER + 
cluster also contained the largest number of significantly mutated 
genes. Most notably, GAT A3 and FOXA1 were mutated in a mutually 
exclusive fashion, whereas ESR1 and XBP1 were typically highly 
expressed but infrequently mutated. Mutations in RUNX1 and its 
dimerization partner CBFB may also have a role in aberrant ER 
signalling in luminal tumours, as RUNX1 functions as an ER 'DNA 
tethering factor' 32 . PARADIGM 33 analysis comparing luminal versus 
basal-like cancers further emphasized the presence of a hyperactivated 
FOXA1 -ER complex as a critical network hub differentiating these two 
tumour subtypes (Supplementary Fig. 17). 

A confirmatory finding here was the high mutation frequency 
of PIK3CA in luminal/ER + breast cancers 34 ' 35 . Through multiple 



technology platforms, we examined possible relationships between 
PIK3CA mutation, PTEN loss, INPP4B loss and multiple gene 
and protein expression signatures of pathway activity. RPPA data 
demonstrated that pAKT, pS6 and p4EBPl, typical markers of 
phosphatidylinositol-3-OH kinase (PI(3)K) pathway activation, were 
not elevated in PIOCA-mutated luminal A cancers; instead, they 
were highly expressed in basal-like and HER2E mRNA subtypes 
(the latter having frequent PIK3CA mutations) and correlated 
strongly with INPP4B and PTEN loss, and to a degree with PIK3CA 
amplification. Similarly, protein 36 and three mRNA signatures 37 " 39 of 
PI(3)K pathway activation were enriched in basal-like over luminal A 
cancers (Fig. 3a). This apparent disconnect between the presence of 
PIK3CA mutations and biomarkers of pathway activation has been 
previously noted 36 . 

Another striking luminal/ER + subtype finding was the frequent 
mutation of MAP3K1 and MAP2K4, which represent two contiguous 
steps within the p38-JNKl pathway 24 ' 40 . These mutations are predicted 
to be inactivating, with MAP2K4 also a target of focal DNA loss in 
luminal tumours (Supplementary Fig. 9). To explore the possible 
interplay between PIK3CA, MAP3K and MAP2K4 signalling, MEMo 
analysis 41 was performed to identify mutually exclusive alterations 
targeting frequently altered genes likely to belong to the same pathway 
(Fig. 4). Across all breast cancers, MEMo identified a set of modules 
that highlight the differential activation events within the receptor 
tyrosine kinase (RTK)-PI(3)K pathway (Fig. 4a); mutations of 
PIK3CA were very common in luminal/ER + cancers whereas PTEN 
loss was more common in basal-like tumours. Almost all MAP3K1 and 
MAP2K4 mutations were in luminal tumours, yet MAP3K1 and 
MAP2K4 appeared almost mutually exclusive relative to one another. 

The TP53 pathway was differentially inactivated in luminal/ER + 
breast cancers, with a low TP53 mutation frequency in luminal A 
(12%) and a higher frequency in luminal B (29%) cancers (Fig. 1). In 
addition to TP53 itself, a number of other pathway- inactivating events 
occurred including ATM loss and MDM2 amplification (Figs 3b and 
4b), both of which occurred more frequently within luminal B cancers. 
Gene expression analysis demonstrated that individual markers of 
functional TP53 (GADD45A and CDKN1A), and TP53 activity 42 ' 43 
signatures, were highest in luminal A cancers (Fig. 3b). These data 
indicate that the TP53 pathway remains largely intact in luminal A 
cancers but is often inactivated in the more aggressive luminal B 
cancers 44 . Other PARADIGM-based pathway differences driving 
luminal B versus luminal A included hyperactivation of transcriptional 
activity associated with MYC and FOXM1 proliferation. 

The critical retinoblastoma/RBl pathway also showed mRNA- 
subtype- specific alterations (Fig. 3c). RBI itself, by mRNA and 
protein expression, was detectable in most luminal cancers, with 
highest levels within luminal A. A common oncogenic event was 
cyclin Dl amplification and high expression, which preferentially 
occurred within luminal tumours, and more specifically within 
luminal B. In contrast, the presumed tumour suppressor CDKN2C 
(also called pl8) was at its lowest levels in luminal A cancers, con- 
sistent with observations in mouse models 45 . Finally, RBI activity 
signatures were also high in luminal cancers 46 " 48 . Luminal A tumours, 
which have the best prognosis, are the most likely to retain activity of 
the major tumour suppressors RBI and TP53. 

These genomic characterizations also provided clues for druggable 
targets. We compiled a drug target table in which we defined a target 
as a gene/protein for which there is an approved or investigational 
drug in human clinical trials targeting the molecule or canonical 
pathway (Supplementary Table 6). In luminal/ER + cancers, the high 
frequency of PIK3CA mutations suggests that inhibitors of this 
activated kinase or its signalling pathway may be beneficial. Other 
potential significantly mutated gene drug candidates include AKT1 
inhibitors (11 out of 12 AKT1 variants were luminal) and PARP 
inhibitors for BRCA1/BRCA2 mutations. Although still unapproved 
as biomarkers, many potential copy-number-based drug targets 
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Figure 3 | Integrated analysis of the PI(3)K,TP53 and RBI pathways. Breast 
cancer subtypes differ by genetic and genomic targeting events, with 
corresponding effects on pathway activity, a-c, For PI(3)K (a), TP53 (b) and 
RBI (c) pathways, key genes were selected using prior biological knowledge. 
Multiple mRNA expression signatures for a given pathway were defined 
(details in Supplementary Methods; PI(3)K:Saal, PTEN loss in human breast 
tumours; CMap, PI(3)K/mTOR inhibitor treatment in vitro; Majumder, Akt 
overexpression in mouse model; TP53: IARC, expert- cur ated p53 targets; GSK, 
TP53 mutant versus wild-type cell lines; KANNAN, TP53 overexpression in 
vitro; TROESTER, TP53 knockdown in vitro; RB: CHICAS, RBI mouse 
knockout versus wild type; LARA, RBI knockdown in vitro; 
HERSCHKOWITZ, RBI loss of heterozygosity (LOH) in human breast 
tumours) and applied to the gene expression data, in order to score each 
tumour for relative signature activity (yellow, more active). The PI(3)K panel 
includes a protein-based (RPPA) proteomic signature. Tumours were ordered 
first by mRNA subtype, although specific ordering differs between the panels. P 
values were calculated by a Pearson's correlation or a Chi- squared test. 



were identified including amplifications of fibroblast growth factor 
receptors (FGFRs) and IGFR1, as well as cyclin Dl, CDK4 and CDK6. 
A summary of the general findings in luminal tumours and the other 
subtypes is presented in Table 1. 

HER2 -based classifications and summary analysis 

DNA amplification of HER2 was readily evident in this study 
(Supplementary Fig. 9) together with overexpression of multiple 



HER2-amplicon-associated genes that in part define the HER2E 
mRNA subtype (Supplementary Fig. 5). However, not all clinically 
HER2 + tumours are of the HER2E mRNA subtype, and not all 
tumours in the HER2E mRNA subtype are clinically HER2 + . 
Integrated analysis of the RPPA and mRNA data clearly identified a 
HER2 + group (Supplementary Fig. 12). When the HER2 + protein 
and HER2E mRNA subtypes overlapped, a strong signal of EGFR, 
pEGFR, HER2 and pHER2 was observed. However, only —50% of 
clinically HER2 + tumours fall into this HER2E-mRNA-subtype/ 
HER2 -protein group, the rest of the clinically HER2 + tumours were 
observed predominantly in the luminal mRNA subtypes. 

These data indicate that there exist at least two types of clinically 
defined HER2 + tumours. To identify differences between these groups, 
a supervised gene expression analysis comparing 36 HER2E-mRNA- 
subtype/HER2 + versus 31 luminal-mRNA-subtype/HER2 + tumours 
was performed and identified 302 differentially expressed genes 
(g-value = 0%) (Supplementary Fig. 18 and Supplementary Table 7). 
These genes largely track with ER status but also indicated that HER2E- 
mRNA-subtype/HER2 + tumours showed significantly higher expres- 
sion of a number of RTKs including FGFR4, EGFR, HER2 itself, as well 
as genes within the HER2 amplicon (including GRB7). Conversely, the 
luminal-mRNA-subtype/HER2 + tumours showed higher expression 
of the luminal cluster of genes including GAT A3, BCL2 and ESRL 
Further support for two types of clinically defined HER2 + disease 
was evident in the somatic mutation data supervised by either 
mRNA subtype or ER status; TP53 mutations were significantly 
enriched in HER2E or ER-negative tumours whereas GAT A3 muta- 
tions were only observed in luminal subtypes or ER + tumours. 

Analysis of the RPPA data according to mRNA subtype identified 
36 differentially expressed proteins (g-value <5%) (Supplementary 
Fig. 18G and Supplementary Table 8). The EGFR/pEGFR/HER2/ 
pHER2 signal was again observed and present within the HER2E- 
mRNA-subtype/HER2 + tumours, as was high pSRC and pS6; con- 
versely, many protein markers of luminal cancers again distinguished 
the luminal-mRNA-subtype/HER2 + tumours. Given the importance 
of clinical HER2 status, a more focused analysis was performed based 
on the RPPA- defined protein expression of HER2 (Supplementary 
Fig. 19)— the results strongly recapitulated findings from the RPPA 
and mRNA subtypes including a high correlation between HER2 
clinical status, HER2 protein by RPPA, pHER2, EGFR and pEGFR. 
These multiple signatures, namely HER2E mRNA subtype, HER2 
amplicon genes by mRNA expression, and RPPA EGFR/pEGFR/ 
HER2/pHER2 signature, ultimately identify at least two groups/ 
subtypes within clinically HER2 + tumours (Table 1). These signatures 
represent breast cancer biomarker(s) that could potentially predict 
response to anti-HER2 targeted therapies. 

Many therapeutic advances have been made for clinically HER2 + 
disease. This study has identified additional somatic mutations that 
represent potential therapeutic targets within this group, including a 
high frequency of PIK3CA mutations (39%), a lower frequency of 
PTEN and PIK3R1 mutations (Supplementary Table 6), and genomic 
losses of PTEN and INPP4B. Other possible druggable mutations 
included variants within HER family members including two somatic 
mutations in HER2, two within EGFR, and five within HER3. 
Pertuzumab, in combination with trastuzumab, targets the HER2- 
HER3 heterodimer 49 ; however, these data suggest that targeting EGFR 
with HER2 could also be beneficial. Finally, the HER2E mRNA 
subtype typically showed high aneuploidy, the highest somatic 
mutation rate (Table 1), and DNA amplification of other potential 
therapeutic targets including FGFRs, EGFR, CDK4 and cyclin Dl. 

Basal-like summary analysis 

The basal-like subtype was discovered more than a decade ago by 
first-generation cDNA microarrays 13 . These tumours are often 
referred to as triple-negative breast cancers (TNBCs) because most 
basal-like tumours are typically negative for ER, PR and HER2. 
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Figure 4 | Mutual exclusivity modules in cancer (MEMo) analysis. Mutual 
exclusivity modules are represented by their gene components and connected 
to reflect their activity in distinct pathways. For each gene, the frequency of 
alteration in basal-like (right box) and non-basal (left box) is reported. Next to 
each module is a fingerprint indicating what specific alteration is observed for 
each gene (row) in each sample (column), a, MEMo identified several 
overlapping modules that recapitulate the RTK-PI(3)K and p38-JNKl 



signalling pathways and whose core was the top-scoring module, b, MEMo 
identified alterations to TP 53 signalling as occurring within a statistically 
significant mutually exclusive trend, c, A basal-like only MEMo analysis 
identified one module that included ATM mutations, defects at BRCA1 and 
BRCA2, and deregulation of the RBI pathway. A gene expression heat map is 
below the fingerprint to show expression levels. 



However, —75% of TNBCs are basal-like with the other 25% com- 
prised of all other mRNA subtypes 6 . In this data set, there was a high 
degree of overlap between these two distinctions with 76 TNBCs, 81 
basal-like, and 65 that were both TNBCs and basal-like. Given the 
known heterogeneity of TNBCs, and that the basal-like subtype 
proved to be distinct on every platform, we chose to use the basal-like 
distinction for comparative analyses. 

Basal-like tumours showed a high frequency of TP53 mutations 
(80%) 9 , which when combined with inferred TP53 pathway activity 
suggests that loss of TP 5 3 function occurs within most, if not all, 
basal-like cancers (Fig. 3b). In addition to loss of TP53, a MEMo 
analysis reconfirmed that loss of RBI and BRCA1 are basal-like 
features (Fig. 4c) 47 ' 50 . PIK3CA was the next most commonly mutated 
gene (—9%); however, inferred PI(3)K pathway activity, whether 
from gene 37 " 39 , protein 36 , or high PI(3)K/AKT pathway activities, 
was highest in basal-like cancers (Fig. 3a). Alternative means of 
activating the PI(3)K pathway in basal-like cancers probably includes 
loss of PTEN and INPP4B and/or amplification of PIK3CA. A recent 
paper 12 performed exome sequencing of 102 TNBCs. Five of the top 
six most frequent TNBC mutations in ref. 12 were also observed at a 
similar frequency in our TNBC subset (Myo3A not present here); of 
those five, three passed our test as a significantly mutated gene in 
TNBCs (Supplementary Table 2). 
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Expression features of basal-like tumours include a characteristic 
signature containing keratins 5, 6 and 17 and high expression of 
genes associated with cell proliferation (Supplementary Fig. 5). 
A PARADIGM 33 analysis of basal-like versus luminal tumours 
emphasized the importance of hyperactivated FOXM1 as a transcrip- 
tional driver of this enhanced proliferation signature (Supplementary 
Fig. 17). PARADIGM also identified hyperactivated MYC and HIF1- 
ot/ARNT network hubs as key regulatory features of basal-like 
cancers. Even though chromosome 8q24 is amplified across all 
subtypes (Supplementary Fig. 9), high MYC activation seems to be 
a basal-like characteristic 51 . 

Given the striking contrasts between basal-like and luminal/ 
HER2E subtypes, we performed a MEMo analysis on basal-like 
tumours alone. The top-scoring module included ATM mutations, 
BRCA1 and BRCA2 inactivation, RBI loss and cyclin El amplification 
(Fig. 4c). Notably, these same modules were identified previously for 
serous ovarian cancers 41 . Furthermore, the basal-like (and TNBC) 
mutation spectrum was reminiscent of the spectrum seen in serous 
ovarian cancers 52 with only one gene (that is, TP 53) at >10% muta- 
tion frequency. To explore possible similarities between serous 
ovarian and the breast basal-like cancers, we performed a number 
of analyses comparing ovarian versus breast luminal, ovarian versus 
breast basal-like, and breast basal-like versus breast luminal cancers 
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Table 1 | Highlights of genomic, clinical and proteomic features of subtypes 



Subtype 



Luminal B 



ER + /HER2~(%) 
HER2 + (%) 
TNBCs (%) 
TP53 pathway 



87 

7 

2 



TP53 mut (12%); gain of MDM2 
(14%) 

PIK3CA/PTEN pathway PIK3CA mut (49%); PTEN 



82 
15 
1 

TP53 mut (32%); gain of MDM2 
(31%) 

PIK3CA mut (32%) PTEN mut/loss 



mut/loss (13%); INPP4B loss (9%) (24%) INPP4B loss (16%) 



RBI pathway 

mRNA expression 
Copy number 

DNA mutations 
DNA methylation 
Protein expression 



Cyclin Dl amp (29%); CDK4 gain 
(14%); low expression of 
CDKN2C; high expression of RBI 
High ER cluster; low proliferation 

Most diploid; many with quiet 
genomes; lq, 8q, 8pl 1 gain; 8p, 
16q loss; llql3.3 amp (24%) 

PIK3CA (49%); TP53 (12%); 
GATA3 (14%); MAP3K1 (14%) 



High oestrogen signalling; high 
MYB; RPPA reactive subtypes 



Cyclin Dl amp (58%); CDK4 gain 
(25%) 

Lower ER cluster; high proliferation 

Most aneuploid; many with focal 
amp; lq, 8q, 8pl 1 gain; 8p, 16q 
loss; llql3.3amp(51%); 
8pll.23 amp (28%) 
TP53 (32%); PIK3CA (32%); 
MAP3K1 (5%) 

Hypermethylated phenotype for 
subset 

Less oestrogen signalling; high 
FOXM1 and MYC; RPPA reactive 
subtypes 



10 

2 

80 

TP53 mut (84%); gain of MDM2 
(14%) 

PIK3CA mut (7%); PTEN mut/loss 
(35%); INPP4B loss (30%) 

RBI mut/loss (20%); cyclin El 
amp (9%); high expression of 
CDKN2A; low expression of RBI 
Basal signature; high proliferation 

Most aneuploid; high genomic 
instability; lq, lOp gain; 8p, 5q 
loss; MYC focal gain (40%) 

TP53 (84%); PIK3CA (7%) 

Hypomethylated 

High expression of DNA repair 
proteins, PTEN and INPP4B loss 
signature (pAKT) 



20 
68 
9 

TP53 mut (75%); gain of 
MDM2 (30%) 
PIK3CAmut (42%); PTEN 
mut/loss (19%); INPP4B 
loss (30%) 

Cyclin Dl amp (38%); 
CDK4 gain (24%) 

HER2 amplicon signature; 
high proliferation 
Most aneuploid; high 
genomic instability; lq, 8q 
gain; 8p loss; 17ql2 focal 
ERRB2 amp (71%) 
TP53 (75%); PIK3CA 
(42%); PIK3R1 (8%) 



High protein and phospho- 
protein expression of EGFR 
and HER2 



Percentages are based on 466 tumour overlap list. Amp, amplification; mut, mutation. 



(Fig. 5). Comparing copy number landscapes, we observed several 
common features between ovarian and basal-like tumours including 
widespread genomic instability and common gains of lq, 3q, 8q and 
12p, and loss of 4q, 5q and 8p (Supplementary Fig. 20 A). Using a 
more global copy number comparison, we examined the overall 
fraction of the genome altered and the overall copy number correla- 
tion of ovarian cancers versus each breast cancer mRNA subtype 
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Figure 5 | Comparison of breast and serous ovarian carcinomas. 

a, Significantly enriched genomic alterations identified by comparing basal-like 
or serous ovarian tumours to luminal cancers, b, Inter-sample correlations 
(yellow, positive) between gene transcription profiles of breast tumours 
(columns; TCGA data, arranged by subtype) and profiles of cancers from 
various tissues of origin (rows; external 'TGEN expO' data set, GSE2109) 
including ovarian cancers. 



(Supplementary Fig. 20 A, B); in both cases, basal-like tumours were 
the most similar to the serous ovarian carcinomas. 

We systematically looked for other common features between 
serous ovarian and basal-like tumours when each was compared to 
luminal. We identified: (1) BRCA1 inactivation; (2) RBI loss and 
cyclin El amplification; (3) high expression of AKT3; (4) MYC 
amplification and high expression; and (5) a high frequency of 
TP 53 mutations (Fig. 5a). An additional supervised analysis of a large, 
external multitumour type transcriptomic data set (Gene Expression 
Omnibus accession GSE2109) was performed where each TCGA 
(The Cancer Genome Atlas) breast tumour expression profile was 
compared via a correlation analysis to that of each tumour in the 
multitumour set. Basal-like breast cancers clearly showed high 
mRNA expression correlations with serous ovarian cancers, as well 
as with lung squamous carcinomas (Fig. 5b). A PARADIGM analysis 
that calculates whether a gene or pathway feature is both differentially 
activated in basal-like versus luminal cancers and has higher overall 
activity across the TCGA ovarian samples was performed; this iden- 
tified comparably high pathway activity of the HIFl-a/ARNT, MYC 
and FOXM1 regulatory hubs in both ovarian and basal-like cancers 
(Supplementary Fig. 20C). The common findings of TP53, RBI and 
BRCA1 loss, with MYC amplification, strongly suggest that these are 
shared driving events for basal-like and serous ovarian carcinogenesis. 
This suggests that common therapeutic approaches should be con- 
sidered, which is supported by the activity of platinum analogues and 
taxanes in breast basal-like and serous ovarian cancers. 

Given that most basal-like cancers are TNBCs, finding new drug 
targets for this group is critical. Unfortunately, the somatic mutation 
repertoire for basal-like breast cancers has not provided a common 
target aside from BRCA1 and BRCA2. Here we note that -20% of 
basal-like tumours had a germline (n = 12) and/or somatic (n = 8) 
BRCA1 or BRCA2 variant, which suggests that one in five basal-like 
patients might benefit from PARP inhibitors and/or platinum com- 
pounds 53 ' 54 . The copy number landscape of basal-like cancers showed 
multiple amplifications and deletions, some of which may provide 
therapeutic targets (Supplementary Table 6). Potential targets include 
losses of PTEN and INPP4B, both of which have been shown to 
sensitize cell lines to PI(3)K pathway inhibitors 55 ' 56 . Interestingly, 
many of the components of the PI(3)K and RAS-RAF-MEK pathway 
were amplified (but not typically mutated) in basal-like cancers 
including PIK3CA (49%), KRAS (32%), BRAF (30%) and EGFR 
(23%). Other RTKs that are plausible drug targets and amplified in 
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some basal-like cancers include FGFR1, FGFR2, IGFR1, KIT, MET 
and PDGFRA. Finally, the PARADIGM identification of high HIF1- 
ot/ARNT pathway activity suggests that these malignancies might be 
susceptible to angiogenesis inhibitors and/or bioreductive drugs that 
become activated under hypoxic conditions. 

Concluding remarks 

The integrated molecular analyses of breast carcinomas that we 
report here significantly extends our knowledge base to produce a 
comprehensive catalogue of likely genomic drivers of the most 
common breast cancer subtypes (Table 1). Our novel observation that 
diverse genetic and epigenetic alterations converge phenotypically 
into four main breast cancer classes is not only consistent with con- 
vergent evolution of gene circuits, as seen across multiple organisms, 
but also with models of breast cancer clonal expansion and in vivo cell 
selection proposed to explain the phenotypic heterogeneity observed 
within defined breast cancer subtypes. 

METHODS SUMMARY 

Specimens were obtained from patients with appropriate consent from institutional 
review boards. Using a co-isolation protocol, DNA and RNA were purified. In total, 
800 patients were assayed on at least one platform. Different numbers of patients 
were used for each platform using the largest number of patients available at the 
time of data freeze; 466 samples (463 patients) were in common across 5 out of 6 
platforms (excluding RPPA) and 348 patients were in common on 6 out of 6 
platforms. Technology platforms used include: (1) gene expression DNA micro- 
arrays 52 ; (2) DNA methylation arrays; (3) miRNA sequencing; (4) Affymetrix SNP 
arrays; (5) exome sequencing; and (6) reverse phase protein arrays. Each platform, 
except for the exome sequencing, was used in a de novo subtype discovery analysis 
(Supplementary Methods) and then included in a single analysis to define an overall 
subtype architecture. Additional integrated across -platform computational ana- 
lyses were preformed including PARADIGM 33 and MEMo 41 . 
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